A Survey on Methods to Handle Imbalance Dataset

نویسنده

  • Apurva Sonak
چکیده

Imbalanced data set, a problem often found in real world application, can cause seriously negative effect on classification performance of machine learning algorithms. There have been many attempts at dealing with classification of unbalanced data sets. To handle the problem of imbalanced data is to re balance them artificially by oversampling and/or under-sampling.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Survey on Methods for Solving Data Imbalance Problem for Classification

The term “data imbalance” in classification is a well established phenomenon in which data set contains unbalanced class distributions. Dataset is called unbalanced if it contains at least one class which is presented by very few examples. A range of solutions have been proposed for the problem of data imbalance including data sampling, cost evaluation of model, bagging, boosting, Genetic Progr...

متن کامل

Extracting Predictor Variables to Construct Breast Cancer Survivability Model with Class Imbalance Problem

Application of data mining methods as a decision support system has a great benefit to predict survival of new patients. It also has a great potential for health researchers to investigate the relationship between risk factors and cancer survival. But due to the imbalanced nature of datasets associated with breast cancer survival, the accuracy of survival prognosis models is a challenging issue...

متن کامل

Handling Data Imbalance in Automatic Facial Action Intensity Estimation

Automatic Action Unit (AU) intensity estimation is a key problem in facial expression analysis. But limited research attention has been paid to the inherent class imbalance, which usually leads to suboptimal performance. To handle the imbalance, we propose (1) a novel multiclass under-sampling method and (2) its use in an ensemble. We compare our approach with state of the art sampling methods ...

متن کامل

High performance of the support vector machine in classifying hyperspectral data using a limited dataset

To prospect mineral deposits at regional scale, recognition and classification of hydrothermal alteration zones using remote sensing data is a popular strategy. Due to the large number of spectral bands, classification of the hyperspectral data may be negatively affected by the Hughes phenomenon. A practical way to handle the Hughes problem is preparing a lot of training samples until the size ...

متن کامل

A hierarchical Convolutional Neural Network for Segmentation of Stroke Lesion in 3D Brain MRI

Introduction: Brain tumors such as glioma are among the most aggressive lesions, which result in a very short life expectancy in patients. Image segmentation is highly essential in medical image analysis with applications, particularly in clinical practices to treat brain tumors. Accurate segmentation of magnetic resonance data is crucial for diagnostic purposes, planning surgical treatments, a...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015